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ABSTRACT 

Critarion-raiaranoad tasta (CRTs) ara conatructad to 
permit tha intarpratation of asaminaa taats parformanea in ralation 
to a sat of wdll-*daf inad aompatanaias* CRTs ara aurrantly usad 
axtensivaly in schools r industry^ and tha arnad sarviaas bacausa thay 
provida valuabla and differant information from norm-raf aranoad 
testSft Taat publishars/ sshool distrietSf and stata dapartmants of 
aducation produce CRTs; howavar^ many of tha availabla tasts fall far 
short of tha taehniaal quality neeessary for tham to aacompllsh thair 
intandad purpoaas. This digest providas praotitionars and test 
davalopars with guidalinas for evaluating CRTs. Drawn from tha 
Standards for Educational and Paychological Tasting, 25 content and 
teehnical guastiona are presented that must be answered when 
evaluating cri tar ion^referancad tests. Tha technology for preparing 
CRTs is now wall developed, and practitioners can avoid improperly 
prepared tests by addressing these questions « (BS) 
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rltorion-refcronced imiM 
(CI?Tit) nr<* coMHtnu tod iu iH^rtnh 
the interpretation of CKaminee 
test performance in relation to a 
act of well-defined competuncics 
(Fopham, 1978). CRT scores have 
three rommon uses: 
L to deseribe examinee perfor- 
iOancc in relation to competen=^ 
cieH of interest; 
2, to assign examinees to mastery 
Mtates (eg,, "ma*^ters'' and "non- 
masters"), for tfuch competency 
of interest, or in relation to a 
group of competencieB defining 
a domain of content; and 
-3. to describe the performance of 
specified groups of examinees In 
program evaluation HtudleB. 
CRTs are currently used ex ten = 
sively in schools^ industry, and the 
anned services because the>'^ prr^idc 
valuable information that difTers 
from the information provided by 
norm ' referenced tests (NRTs), But 
CRTs, like other data-collection 
Instruments useil in educational 
decision-making, are of variable 
quality, and lesser quality test^ are 
not going to fully meet the informa- 
tional needs of users, TTiis digest 
was prepared to help practitionera 
identic high quality criterion-refer- 
enced tests. Of course the same 
guidelines should be useful to test 
developei^ as welL 

BACKGROUND 

Most of the mayor test publishers 
have available an assortment of 
crlterion-refereneed tests for aasess^ 



Ing readings mathematics, languuge 
arts, and other content iireaH in 
grades K to 12, In i^^ddition, many 
local school districts, state depart- 
ments of oducation, and smaller test 
publiBhem have produced their own 
criterion-referenced tests. Many of 
the available tests, howeveft fall far 
short of the terhnical quality neces- 
saiy for them to accomplish their 
intenderi purposes, When tesb lack 
sufncient ti^ehnical quality, the re are 
a number of plausible explanations: 
For one, many of the avaUable crite» 
rion-referenced tests were developed 
before an adequate testing t4!chnol- 
o^was fully explicated. Fortun- 
ately, an adequate technolo^ for 
constructing criterion-referenced 
tests and using criterion-referenced 
test scores is now available (Berkj 
1984; Hambleton, in press; Hamble- 
ton, Swaminathan, Algina* & Coul- 
son, 1978; I^pham, 1978). Guide- 
lines can be produced by which 
criterion-referenced test^ and their 
manuals can be evaluated. The 
recently published Standards for 
Educational and J^ychologiaai 
Te$iing (1985) for evaluating tests 
and test manuals, prepared by a 
Joint committee of AERA, APA, and 
NCME, is helpful, too, ^nd wm used 
in preparing the next section. 

TEST EVi^UM'ION 

*niere are 26 content and techni- 
cal quesdons that must be answered 
when e\^luating criterion-referenced 
tests, eommercially prepared or 
otherwisai . 



Content Questions 

1 . Do the competencies measured 
by the test cover the content 

d o m al n of i n teres t? 

2. Are the competencies them- 
selves well-defined so that the 
appropriate domain of content 
for each competeno^ is clear? 

3. Is there a capability of adding to 
or taking away from the test 
content so that the final test 
provides a suitable mateh to the 
content domain of interest? 

4. Is an appiupriate rationale 
offered for the selection of com- 
petencies measured in the test? 

5. Is the test-item content appro- 
priate to measure the competen- 
cies? 



Technical Questions 

(h Do the test items meet the stan- 
dard Item-writing principles? 

7. Ai e the test it^ms free from bias 
and stereotyping? 

8. Is each group of test items mea> 
suring a competenQ^ r^msenia- 
tive of the domain of content 
spanned by the competeno^ 

9. Was the item-review process 
carried out properly? 

10. Was a suitable sample of exam- 
inees used to pilot the test 
Items? 

IL Were item atatistici used cor- 
rect)^ in building the teat? 

12. Do Uie test directions address 
imports t information such as 
test puipose, scoring, time 



limits, piuifllng score (8), and 
triiirking answer sheets (or 
test booklet,*!)? 

13. Are tho time limits sufUcicnt for 
eKamlftees to complete the te8t? 

14. Are tho teat admlnii?trutor'e 
directions complete so m 

to imlim a proper test admin- 
iRtration? 

15. Are the print sJ^e, quality of 
prinilng and artwork, and page 
layouts appropriate for the 

16. Ar*i the rellabUi^and vnlidity 
studies conducted with large 
enough BampleB of eKumineeH 
for v/hom the teBt m intended? 

IT. Are useful reliability indices, 
Huch m "deciHion-conslHtency*' 
and •kappa," reported for the 
teat .Hcore^? 

18. Are the rellabiUty indices high 
enough to Justify the use of the 
test in the intended application? 

19. Are pei^onal and envlronmentiil 
factors that influence test per- 
formance addre.^8ed in the test 
manual? 

20. U a test manua] available that 
addresses test purpoies, devel^ 
opment, adniinistratlonj ficoring, 
p^hometric properties of the 
test scores, and test interpreta- 
tions? 

21. Is there justification offered 
(and Is it appropriate) for the 



choice of standard (or cnit-olT 
score)? 

22. Is the proces.^ used to set a 
standard fully documentiHl 
In tho manual, and is It 
appropHate? 

23. Is there acceptable and f\Jily 
documented validity evidence for 
the intended rnvf-'i of the test 
scores? 

24. Are there cnut - n the tech- 
nical manual . ui the ni^e of 
ertora of meiisurement and/or 
mlscliL^slllcatlon and rhe mie 
of these errors in Ncore iiuei - 
pretatlons? 

25. Are the test scores reported 
fully and cleariyf 

Clarification and expansion of 
many of the questions above can be 
found ill Ik-rk (1084), HambU?ton 
(in press) t and Popham (1078). 

CONCLUDmo REMARKS 

Identiftdng well-constructed, reli- 
able, and valid criterion-referenced 
tests is esaential for insuring that 
the purposes of a testing program 
are accomplished. The importance 
of the 25 Individual questions above 
will vary somewhat from one test to 
another, Still, some attentton to each 
question in criterion^referenced test 
evaluation would nonualiy be desir- 
able. *nie technology for preparing 



criterion refereneed tCHts is wcIN 
developed at this tim«!, Ihractltloners 
should expect that the technolojOf 
will be used and used correctly in 
preparing tests, and when It is not, 
these lmprt)perly prepared tests 
should be avoifled. 

Ilonald fC Hambleton, 

I jniverslty of Mjissachusetts 
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